Baking the Telco LLM: Crafting the Ultimate Tuning Data Recipe
Introduction
Data is the life-blood of any LLM, so in order to create a highly-performant Telco LLM, it is important to have intimate knowledge of the Telco domain and create relevant data to learn Telco business logic. Just as it takes the finest ingredients to craft a delicious meal, data is the key ingredient to building and optimizing the Telco LLM. While data is the key ingredient, a well-designed “recipe” makes or breaks the dish. In the case of the Telco LLM, it is important to decide which data to add in which quantities, and in which order. This is the story of the design decisions and data that went into crafting the highly tuned and optimized Telco LLM.
Data Overview
The tables below provide a high-level overview of the datasets that make up the TelTask and TelInstruct datasets. The datasets with a ‘*’ are considered core datasets and are key to creating a highly performant Telco LLM.
TelTask
TelInstruct
Data Filtering & Cleaning
The Telco LLM makes use of real service log data to learn Telco domain knowledge and Telco business logic. As such, it is of the utmost importance to properly clean and anonymize the data to prevent any personal information or proprietary knowledge leakage. SKT takes data cleaning and anonymization incredibly seriously and follows the following process.
All personally identifiable information (PII), such as names, phone numbers, bank accounts, is replaced by an internal PII module. This module has been tuned and tweaked over years of using log data to train models for SKT products and services.
The PII module does a good job, but the replacements may not be natural. As such, SKT follows a double replacement process. After the PII module replaces the initial values, SKT uses GPT-4 to replace the PII targets again. GPT-4 creates natural, in-domain replacements, leading to high quality but fully anonymized data.
While anonymized data is important, it is vital to have clean, high quality data as well. As such, SKT employs a number of heuristics to ensure data quality. In particular,
SKT looks at conversation length (as a number of turns) and filters out data that is too long or too short.
We also filter our conversations that only contain too many filler words, such as “the” or “uh”.
Another important criteria is “success”: SKT purposely selects conversations that lead to successful issue resolution and or led to conversions (services or products sold)
Data Design
TelTask
SKT focused on two core data sets. First, we designed the TelTask dataset, so the TelcoLLM could understand basic Telco domain knowledge, general business terms and the conversation flow between customers and customer service agents. The core data in the TelTask dataset consists of conversation summarization, intent recognition, and topic generation.
The first TelTask task is summarization, which allows the Telco LLM to summarize conversations between customers and customer service agents and include key contents and use accurate Telco terminology. The main challenge here is utilizing terminology in the context of the Telco domain that is slightly different from everyday use. To achieve the end goal, the summary data is designed to generate output that fits the context of the Telco domain by learning to summarize based on natural conversations between customers and agents.
Second, TelTask includes Intent recognition data. The end goal of this data is to understand users’ requests and take action on said requests. This data is purposely designed with customer requests and the Telco domain in mind. By combining Telco domain knowledge with user request understanding, the Telco LLM is able to take action and assist the user.
The third TelTask is Topic generation. Topics are created using concise noun-type keywords that describe the core topics of Telco consultation conversations. It is important to focus on specialized terms that are meaningful Telco products, services, and actions. Specificity is key, and the generated topics should be as relevant and as specific as possible. For example, when discussing “roaming” for a trip to Thailand, topics should include infrequent constructions like “Baro 3GB Plan” and not the more frequently present topics, such as “Thailand” or the generic topic, “roaming.”
TelInstruct
While the TelTask data was designed to understand Telco domain knowledge, the TelInstruct dataset is constructed to understand the Telco business more deeply and to take action. The two core datasets in TelInstruct are Workflow and Machine Reading Comprehension (MRC).
Workflow Data identifies when a “DB search or API call” is needed during a conversation between a customer and a customer service agent. This data is designed not to just address the “what” (the DB search or API call), but the “when” and the “how” as well. In other words, from this data, the Telco LLM learns when to search the DB / call an API, how to search the DB / call an API, and what is needed to make such a call. Since Workflow data is based on real service log data, the Telco LLM learns to infer customer requirements, even if a request is nebulous or noisy. With the inferred DB search or API call, the Telco LLM can also take action and help resolve and or fulfill customers’ needs.
MRC data is question and answer data about Telco documents that is meant to enhance the Telco LLM’s understanding of the Telco domain. By training the model on various questions and answers that can come from Telco domain documents, the Telco LLM learns knowledge about the Telco domain and generates answers based on Telco documents. In particular, to improve understanding of the long Telco documents, the Q&A task was structured so that the correct answer appears at various locations within a given document. Another MRC task, word-to-text teaches the model to generate sentences containing Telco-specific domain words and phrases. For example, given a document, the task may be to generate a sentence with the following rate-plan related words: <T roaming>, <new subscription>, and <basic fee>.
TelTask Data Examples
Summary
All summary data is manually reviewed to ensure quality, and the key aspects of a good summary are as follows:
Fluency: Each sentence must be natural and grammatical
Coherence: The entire summary must be organized in a structured manner.
Consistency: The summary must contain factual information from the conversation.
Relevance: The summary must contain key content from the conversation.
Conciseness: The summary must not contain any unnecessary information.
To create data more efficiently, SKT uses the following prompt to generate an initial summary, which is then manually reviewed by a linguist.
Summary Prompt
You are an AI counselor specializing in telecommunication companies. Reflecting on the content below, summarize the conversation between the customer and the telecommunications company counselor. 1. Read the conversation and find the key points. 2. The key content is the customer's request and the information provided by the counselor. 3. Be sure to include terms related to major telecommunication companies (e.g. plan name, rate, date, discount price) in the core content. 4. In particular, be sure to include policies and terms related to major telecommunication companies as explained by your counselor. 5. Please write each key content concisely and easily to read according to the consultation flow. 6. Additional content other than core information is not created. 7. The overall summary is a complete text, and the sentence flow should be natural and structured. 8. The contents of the summary inquiry must be summarized accurately using the information in the conversation. 9. Do not include personal information (e.g. name, phone number) in the summary. |
---|
The following is an example of a typical conversation and a good summary of said conversation.
[Customer]: Hello, I extended my roaming on the 15th, and tomorrow is the 25th. Is it possible to extend today? [Counselor]: Nice to meet you, this is Gu Subin. [Counselor]: Hello. [Counselor]: Excuse me, are you currently using the 30-day plan? [Customer]: Yes, we extended it to the 15th. Last month, it was extended to the 15th as well. [Counselor]: Then, I’m sorry. [Counselor]: Yes, I'm sorry, customer, but our rate plan cannot be extended once the period is over while using the product. [Customer]: Ah [Counselor]: If you started on the 15th of last month, the 15th of this month is 30 days, so you can extend it again only after that period ends. [Customer]: Oh, then do I have to do it again tomorrow? [Counselor]: Yes, that’s right. sorry. [Customer]: Uh, tomorrow is Sunday, is that okay? [Counselor]: Yes, our roaming department takes calls even on weekends, so you can contact us. [Customer]: Oh, yes. thank you [Counselor]: Thank you. It was Gu Subin. [Customer]: Yes, thank you. |
---|
The customer applied for the possibility of extending their roaming service, and the customer service representative responded that the extension could only be processed after the current 30-day plan expired. The customer asked if it was possible to extend the plan today, since tomorrow is Sunday, and the customer service representative replied that the roaming department would be available to take calls over the weekend, so the customer could contact them at that time. |
Topic
Topic data is designed so that the model learns to generate topics that are Telco specific and relevant from conversations. All topic data is manually reviewed to ensure quality, and the key aspects of topics are as follows:
Inclusion: Topics must include all specific product and service names, as well as any key information in the conversation.
Type: Topics must consist of nouns and or noun compounds.
Number: Each conversation should have between 1 and 5 topics.
Relevance: Meaningless topics that cannot be entailed from the conversation should not be included.
To create data more efficiently, SKT uses the following prompt to generate an initial summary, which is then manually reviewed by a linguist.
Topic Prompt
You are a knowledgeable and smart Telco AI agent. You work on creating a to help counselors understand the key topics of counseling at a glance and easily search them.Briefly summarize the consultation conversation and create key that represent the conversation. Please select a based on terms related to major telecommunication companies mentioned during the conversation (e.g., additional services, rate plans, number portability, etc.) and specific product names (e.g., uju pass, v coloring, secure login service, etc.). must cover the entire consultation content and must include the core content of the conversation. must be composed of simple noun keywords and generate about 3 to 4 keywords. |
---|
The following is an example of a typical conversation and a good topic list of said conversation.
[Counselor]: Nice to meet you, this is [Kim Ji-won]. [Customer]: Yes, I think I signed up for the secure login service with my phone number by mistake yesterday. [Counselor]: Oh, let me check. Are you [Lee Se-hyeon] whose number you are inquiring about is [010-3792-0678]?[Customer]: Yes, that's right. [Counselor]: Thank you for your confirmation. I checked what you said and confirmed that you were registered. I will ask you one more piece of information and process the cancellation. What is your next address? [Guri-si, Gyeonggi-do [Customer]: [Hakdong 3-ga]. [Counselor]: Thank you for your confirmation. Yes, customer. Since I canceled this service right away, I completed the process to avoid being charged. [Customer]: Yes, thank you. |
---|
['Secure login service', 'Service subscription error', 'Charge'*,'Customer confirmation'] * Delete topic keywords for content that did not actually occur |
TelInstruct Data Examples
Workflow
All workflow data is manually reviewed to ensure quality, and the key aspects of the data are as follows:
Level: Search queries and API calls should be as clear and concise as possible..
Tagging: Only add “search” / API call annotations to utterances that require said calls.
Timing: Tag each search / call to be as consistent as possible (which turn in the conversation)
Query: Search terms must consist of words appearing in the customer utterances and use representative Telco terminology.
To create data more efficiently, SKT uses the following prompt to generate an initial summary, which is then manually reviewed by a linguist.
Query Creation Prompt
After the conversation above, it is counselor's turn to respond. Then counselor will search some query. Here are the given search results from the internal Telecom Database, enclosed by |
---|
The following is an example of an annotated “workflow” conversation.
(counselor) Nice to meet you. I’m Eric (customer) I’d like to extend my optional contract Query [optional contract extension, contract extensions, contract renewal] DB > Various contract extension fulfillment docs > Optional Contract/join/cancel(1) (counselor) The discount is the same for both 1 and 2-year contracts. How to proceed? (customer) Let’s go with 1 year Query [Optional contract 1-year extension, Discounted contract 1-year extension]fuctioncall > Contract product subscription/cancellation > Select > Select task: Contract subscription + Contract transfer > Select contract: Select contract discount > Select month: 12 months > Save DB > Optional contract discount > Summary DB > Optional contract discount > Swing/Processing DB > Optional contract discount > text notification (counselor) Okay. For detailed information on optional contract discounts, please refer to the sign-up notice sent to your mobile phone. If the contract is canceled … (customer) Okay. Thank you! |
---|
Take-Aways
The path has been winding with twists and turns, but we’ve finally landed on a recipe and ingredients that create a delicious, nutritious Telco LLM. The most important lessons learned / take-aways are as follows:
It’s all about the data - without high quality data and a data “recipe” tuning will have deleterious effects.
Quality is more important than quantity - we’ve seen that performance plateaus after a few thousand examples. It’s much more important to have clean, high quality data.
Data variety is important - our first models lacked a variety of data / tasks, and we ended up with models that were decent at the tasks at hand but pretty poor at everything else. With smaller, more varied data, we’ve seen much better results.